Introduction

Preventing unintended pregnancy is important for both individual wellbeing and public health. Unprotected sex occurs for multiple, complex reasons including sexual assault, reproductive coercion, contraceptive mishaps, and lapses in contraceptive use. Emergency contraception is used to prevent pregnancy following unprotected sex or contraceptive failure. In Scotland emergency contraception can be obtained via a pill or intrauterine device.

Currently in Scotland there are two types of emergency contraceptive pill (ECP) prescribed by pharmacies. These are:

  • Levonorgestrel (brand name: Levonelle) - Effective within 72 hours (three days) of unprotected sex
  • Ulipristal Acetate (brand name: ellaOne) - Effective within 120 hours (five days) of unprotected sex

This analysis explores prescribing patterns of ECP using the Public Health Scotland Community Prescriptions Database from 2019 to 2022. Understanding the prescribing patterns of ECP is essential for identifying disparities in access and improving public health strategies. This report aims to identify factors influencing ECP prescribing in Scotland through answering:

  1. What are the trends in prescribing of ECP between 2019 and 2022?
  2. How does the ECP prescription rate vary across Health Board regions in Scotland?
  3. Are pharmacies in more deprived areas prescribing more ECP than those in affluent areas?
  4. Is there geographical variation in prescribing practices of ECP by Health Board region?
# load necessary libraries
library(tidyverse)
library(janitor) 
library(gt) 
library(here)
library(lubridate)
library(sf)
library(ggspatial)
library(plotly)

Read in datasets:

# Read in the Health Board names (HB_names). See code for link.
HB_names <- read_csv("https://www.opendata.nhs.scot/dataset/9f942fdb-e59e-44f5-b534-d6e17229cc7b/resource/652ff726-e676-4a20-abda-435b98dd7bdc/download/hb14_hb19.csv") %>% 
  clean_names() # ensures column names are unique, lower case and spaces and special characters are replaced with underscores. I will use this function for all read ins to ensure my data is consistent and easy to manipulate.

# Read in population data per Health Board from the 2022 census data. Available from: https://statistics.ukdataservice.ac.uk/dataset/scotland-s-census-2022-uv102a-age-by-sex/resource/b2d295c2-af53-4b3d-a075-7815cadd9060 
all_population_data <- read_csv(here("data", "UV103_age_health_board_census.csv"), skip = 10) %>% # locates the csv and excludes the first 10 lines in the csv as they are redundant
  rename(Spare = "...6", # remove unused columns
         hb_name = "Health Board Area 2019",
         hb_population = Count) %>% # rename() formats the data to match the prescriptions dataframe
  filter(Sex == "Female") %>% # filter female population, as men do not take ECP
  select(hb_name, Age, hb_population) %>% # select columns of interest
  mutate(hb_name = paste("NHS", hb_name)) %>%  # change hb_name column format to match the Health Board dataframe format
  clean_names()

# filter dataset for the total female population for each health board:
population_data <- all_population_data %>% 
  filter(age == "All people") %>% 
  select(hb_name, hb_population)

# select the population aged 16 to 34 years as this is the population the population most likely engaging with risk taking behaviors such as unprotected sex. I chose 16 as this is the age of consent in Scotland, and 34 as my cut off for young adults. 
young_population_data <- all_population_data %>% 
  filter(age%in% c(16:34)) %>% # filter ages of interest
  mutate(age = as.numeric(age)) %>% # make numerical so they can be placed in to buckets
  mutate(age_group = case_when(between(age, 16,34)~'16-34')) %>% # make bucket 16-34 years
  group_by(hb_name, age_group) %>% 
  summarise(pop_hb_16_34 = sum(hb_population)) # sum total population aged 16 to 34 per health board region. 

Define a function to read in the prescription data for a defined year:

#For efficiency I created a function to read in the prescriptions datasets. I downloaded 12 months of prescription data for each year from 2019 to 2022. I placed the 12 months of data into their relevant folder named all_months_year, where year was specific to the data it contained.
read_all_prescriptions <- function(year){
  all_files <- list.files(here("data", paste0("all_months_", year)), pattern = "csv") #list.files() retrieves files from the relevant directory, and the paste0() dynamically constructs the folder name based on the year variable placed into the function.
  all_prescriptions <- all_files %>%
    map_dfr(~read_csv(here("data", paste0("all_months_", year),.))) %>% #map_dfr() row-binds the datasets being read in
    clean_names() %>%
    drop_na(bnf_item_description) # drop the rows with missing bnf_item values
  return(all_prescriptions)}

Read in datasets using read_all_prescriptions() function and start data wrangling:

# all_prescriptions_2019 <- read_all_prescriptions(2019)
# all_prescriptions_2020 <- read_all_prescriptions(2020)
# all_prescriptions_2021 <- read_all_prescriptions(2021)
# all_prescriptions_2022 <- read_all_prescriptions(2022)
# 
# # 2019 dataset has hbt2014 as a column name instead of hbt. Rename to make column name consistent
# all_prescriptions_2019 <- all_prescriptions_2019 %>%
#   rename(hbt = "hbt2014")
# 
# # combine 4 years of data into one dataset to make it easier to wrangle
# combined_prescriptions <- bind_rows(
#   all_prescriptions_2019,
#   all_prescriptions_2020,
#   all_prescriptions_2021,
#   all_prescriptions_2022 )%>% # I mutate and select the prescriptions of interest to prevent a very large dataset from being stored in my environment (prevents R slowing down and crashing)
#   mutate(
#     date = parse_date_time(paid_date_month, "ym"), # use lubridate to format date
#     drug_simple = case_when(
#       str_detect(bnf_item_description, "LEVONO") ~ "Levonorgestrel",
#       str_detect(bnf_item_description, "ULIPR") ~ "Ulipristal Acetate",
#       TRUE ~ "Other")) %>% # used case_when() to select ECP even when they have different dosages
#   filter(drug_simple != "Other",!is.na(date)) %>% #remove prescriptions of no interest and any missing date values
#   filter(hbt != "SB0806") %>% # filter out SB0806 as it is not a health board (Scottish Ambulance Service)
#   filter(!is.na(hbt))

# save combined_prescriptions dataset to a csv in my directory, to prevent many large datasets being stored in my environment and slowing down R. 
# write_csv(combined_prescriptions,"data/combined_prescriptions.csv")
combined_prescriptions <- read_csv(here("data","combined_prescriptions.csv"))

Join and wrangle data:

# join the prescriptions dataset to the Health Board names and health board population datasets
ECP_scripts <- combined_prescriptions %>% 
  full_join(HB_names, by = c("hbt" = "hb")) %>% # join with Health Board names
  full_join(population_data, by = "hb_name") %>% # join with population data
  select(gp_practice, date, drug = drug_simple, hb_name, paid_quantity,hb_population) %>%  # select columns of interest
  mutate(month = factor(month(date), levels = 1:12, labels = month.abb), .after = date) %>%   
  mutate(year = factor(year(date)), .after = month) # extract month and year as a factor with labels to help when plotting

Key Results

How does the ECP prescription rate vary across Health Board regions in Scotland?

This figure explores if there are differences in the average annual prescription rate for each Health Board in Scotland. To explore possible reasons for differences I calculated the proportion of young people in each Health Board to see if there was a trend.

Wrangle data to get the columns and values of interest:

ECP_anual_rate_data <- ECP_scripts %>% 
  group_by(hb_name, drug) %>% # aggregate data by Health Board and drug
  summarise(total_quantity_4_years = sum(paid_quantity, na.rm = TRUE), # total prescriptions over 4 years
    avg_annual_total_quantity = total_quantity_4_years / 4, # average annual prescriptions
    hb_population = first(hb_population), # hb_population is consistent within each hb_name
    .groups = "drop") %>% # ungroup data
  drop_na(drug) %>% # remove rows with no drug values
  mutate(avg_annual_presc_100000 = avg_annual_total_quantity * 100000 / hb_population) %>% # average annual rate of prescriptions per 100,000
  select(hb_name, drug, avg_annual_presc_100000, hb_population) %>% # select columns of interest before pivot
  pivot_wider(names_from = drug, values_from = avg_annual_presc_100000, names_glue = "{drug}_rate") %>% # Rename columns for clarity
  clean_names() # clean column names following pivot

# calculate total ECP prescription rate per health board
ECP_anual_rate_data <- ECP_anual_rate_data %>% 
  mutate(total_ECP_rate = rowSums(select(., levonorgestrel_rate, ulipristal_acetate_rate), na.rm = TRUE)) %>%  #sum rates 
  arrange(desc(total_ECP_rate)) # arrange by total rate in descending order

#calculate percentage young people per health board 
ECP_anual_rate_data_young <- ECP_anual_rate_data %>% 
  full_join(young_population_data) %>% # to get the population of those aged 16-34 years 
  group_by(hb_name) %>% 
  mutate(prop_young_ppl_hb = pop_hb_16_34/hb_population) %>% # calculate proportion of young people in each health board 
  ungroup()

Table 1:

annual_avg_ECP_table <- ECP_anual_rate_data_young %>% 
  select(hb_name, levonorgestrel_rate, ulipristal_acetate_rate, total_ECP_rate,prop_young_ppl_hb) %>% # Select relevant columns
  gt() %>% 
  cols_label(hb_name = "Health Board",
             total_ECP_rate= "Total",
             levonorgestrel_rate= " Levonorgestrel",
             ulipristal_acetate_rate=" Ulipristal Acetate",
             prop_young_ppl_hb = "% Aged 16-34 Years") %>% # rename columns to make reader-friendly
  fmt_number(columns = c(levonorgestrel_rate, ulipristal_acetate_rate, total_ECP_rate, prop_young_ppl_hb), decimals = 0) %>% # no decimal points as false accuracy detracts from the message in the data
  cols_align(align = "center", columns = c(levonorgestrel_rate,ulipristal_acetate_rate,total_ECP_rate, prop_young_ppl_hb)) %>%  # centre column names
  grand_summary_rows(columns = c(levonorgestrel_rate,ulipristal_acetate_rate, total_ECP_rate), fns = list("Overall Average" = ~mean(., na.rm = TRUE)), fmt = list(~ fmt_number(., decimals = 0))) %>% # add an overall average for the rate columns
  fmt_percent(columns = prop_young_ppl_hb, decimals = 0) %>%  # add percentage sign
  tab_header(title = md("**Average Annual Rate of Emergency Contraception Prescriptions by Health Board in Scotland.**"),
             subtitle = md("Rate per 100,000 women, derived from the mean prescription rates across the years 2019 to 2022. Health Boards are ranked in descending order.")) %>% # add a title and subtitle; md() allows text formatting from mark down 
  tab_spanner(label = md("*Prescription rate per 100,000 women*"), columns = c(levonorgestrel_rate,ulipristal_acetate_rate, total_ECP_rate)) %>% # add a title to the rate columns.
  tab_source_note(md("*Data from Public Health Scotland. Available from: (https://www.opendata.nhs.scot/dataset/prescriptions-in-the-community)*")) %>% 
  tab_stubhead(md("**2019-2022**")) %>% 
  tab_footnote(footnote = "includes Capital City, Edinburgh", locations = cells_body(columns = hb_name, rows = 2))%>% 
  opt_stylize(style = 6, color = "blue")

annual_avg_ECP_table
Average Annual Rate of Emergency Contraception Prescriptions by Health Board in Scotland.
Rate per 100,000 women, derived from the mean prescription rates across the years 2019 to 2022. Health Boards are ranked in descending order.
2019-2022 Health Board
Prescription rate per 100,000 women
% Aged 16-34 Years
Levonorgestrel Ulipristal Acetate Total
NHS Greater Glasgow and Clyde 3,376 29 3,404 26%
NHS Lothian1 2,495 56 2,551 27%
NHS Ayrshire and Arran 2,027 15 2,043 19%
NHS Tayside 1,365 22 1,387 22%
NHS Forth Valley 1,268 73 1,342 22%
NHS Grampian 1,210 16 1,226 22%
NHS Fife 1,049 54 1,103 22%
NHS Lanarkshire 784 48 832 22%
NHS Shetland 779 17 796 19%
NHS Orkney 609 7 616 17%
NHS Dumfries and Galloway 515 24 539 17%
NHS Western Isles 192 197 389 16%
NHS Highland 285 52 337 17%
NHS Borders 242 76 318 17%
Overall Average — 1,157 49 1,206 —
Data from Public Health Scotland. Available from: (https://www.opendata.nhs.scot/dataset/prescriptions-in-the-community)
1 includes Capital City, Edinburgh

Generally the health boards with the highest annual ECP prescription rate also had the highest proportion of young people living in their health board region. This suggests that young people are more likely using the ECP pharmacy service. Notably the top two Health Boards for prescribing ECP contain major Scottish cities, Glasgow and Edinburgh.

NHS Ayrshire and Arran is an outlier to this trend having the third highest ECP prescription rate, and yet a relatively smaller proportion of young people living in this health board (19%). This needs further exploration, perhaps indicating that under 16s or over 35s are using the ECP services more in this region.

More remote Health Boards such as NHS Highland and NHS borders had the lowest rate of ECP prescribing. They also had a smaller proportion of young people living in their health board.

When comparing the types of ECP prescribed generally levonorgestrel prescribed significantly more than Ulipristal Acetate. However this is not the case in the NHS Western Isles Health Board, where a similar proportion of Levonorgestrel is prescribed to Ulipristal Acetate. This trend is explored more in figure 4 to see whether there is geographical variation in the type of ECP prescribed.

Are pharmacies in more deprived areas prescribing more ECP than those in affluent areas?

This figure explores if

To measure deprivation I have used the Scottish Index of Multiple Deprivation. I took the SIMD 2020v2 dataset from Public Health Scotland website [available from: https://www.opendata.nhs.scot/gl/dataset/scottish-index-of-multiple-deprivation/resource/acade396-8430-4b34-895a-b3e757fa346e ] as this dataset contains SIMD decile weighted per Health Board population.

When interpreting SIMD in deciles rank 1 is considered the most deprived, and rank 10 is least deprived.

# read in datasets: (link available in code)
SIMD <- read_csv("https://www.opendata.nhs.scot/gl/dataset/78d41fa9-1a62-4f7b-9edb-3e8522a93378/resource/acade396-8430-4b34-895a-b3e757fa346e/download/simd2020v2_22062020.csv") %>%
  clean_names() %>%
  select(data_zone, simd2020v2hb_decile)

# I chose to use GP Practices and List Sizes from October 2022, as this was the closest dataset I could find which correlated with the final year of my prescriptions dataset:
gp_addresses <- read_csv("https://www.opendata.nhs.scot/dataset/f23655c3-6e23-4103-a511-a80d998adb90/resource/1a15cb34-fcf9-4d3f-ad63-1ba3e675fbe2/download/practice_contactdetails_oct2022-open-data.csv") %>%
  clean_names() %>%
  select(practice_code, gp_practice_name, data_zone)

# Create ECP_GP by using the GP_addresses dataset to map the GP practice code to a datazone. I then used the column datazone to full_join() the SIMD dataset to the prescriptions dataset.
ECP_GP <- ECP_scripts %>%
  filter(!gp_practice %in% c(99996, 99997, 99998)) %>% # remove dummy GP practice codes as they do not have a known gp practice code so cannot be mapped to a SIMD.
  left_join(gp_addresses, by = c("gp_practice" = "practice_code")) %>%
  #left_join(data_zones, by = "data_zone") %>%
  left_join(SIMD, by = "data_zone") %>%
  drop_na(simd2020v2hb_decile) %>% # need SIMD value to for plot
  group_by(gp_practice) %>% 
  mutate(total_quantity_per_gp = sum(paid_quantity)) %>% 
  clean_names()

# calculate the number of GPs per SIMD (account for each SIMD having a different number of GPs) 
toal_no_GP_per_SIMD <- ECP_GP %>% 
  group_by(simd2020v2hb_decile) %>%
  mutate(unique_gp_count_per_SIMD = n_distinct(gp_practice))

Figure 2:

ECP_SIMD_barchart <- toal_no_GP_per_SIMD %>%
  group_by(simd2020v2hb_decile, drug) %>% 
  summarise(prescriptions_gp = (total_quantity_per_gp / unique_gp_count_per_SIMD), # calculate the prescriptions per GP for each SIMD decile and drug
    .groups = "drop") %>%  # Ungroup data after summarisation
  ggplot(aes(x = factor(simd2020v2hb_decile, levels = 1:10),y = prescriptions_gp,  fill = drug, text = paste("Decile:", simd2020v2hb_decile, "<br>Drug:", drug))) + # customise hover box for interactive chart
  geom_col() +
  scale_fill_manual(values = c("#4575b4", "#91bfdb"), name = "Drug Type") +  # add colour palette
  labs(title = "Emergency Contraception Prescriptions by \n Scottish Index of Multiple Deprivation (SIMD) 2019 to 2022, \n normalised by number of GPs per SIMD decile.",
    x = "SIMD Decile \n (1 = Most Deprived, 10 = Least Deprived)",
    y = "Prescriptions",
    fill = "Drug") +
  theme_minimal(base_size = 12) +
  theme(plot.title = element_text(size = 14, face = "bold", hjust = 0.5),
    plot.subtitle = element_text(size = 12, hjust = 0.5),
    axis.title.x = element_text(size = 12),
    axis.title.y = element_text(size = 12),
    axis.text.x = element_text(size = 10, hjust = 1),
    axis.text.y = element_text(size = 10),
    legend.position = "right",
    legend.title = element_text(size = 12, face = "bold"),
    legend.text = element_text(size = 10))

ECP_SIMD_barchart <- ggplotly(ECP_SIMD_barchart, tooltip = "text") # make plot interactive
ECP_SIMD_barchart

Is there geographical variation in prescribing practices of ECP by Health Board region?

I wanted to explore if there was any correlation between the type of contraception being prescribed and deprivation. To do this I made a ratio of levonorgestrel to total emergency contraception prescriptions.

\[ \frac{Levonorgestrel}{Levonorgestrel + Ulipristal Acetate} \]

This is important to identify any variation in prescribing patterns in more deprived areas. Differences may suggest patients access health services later, so have to use Ulipristal Acetate, or that GPs or Pharmacists have differences in prescribing preferences in more deprived areas.

Figure 3:

# load the NHS Health board Shapefile downloaded from learn page
NHS_healthboards <- st_read(here("data", "NHS_healthboards_2019.shp")) %>% 
  mutate(HBName = paste("NHS", HBName)) %>% # format to match ECP_scripts dataset
  clean_names()
## Reading layer `NHS_healthboards_2019' from data source 
##   `C:\Data_science\B273025\data\NHS_healthboards_2019.shp' using driver `ESRI Shapefile'
## Simple feature collection with 14 features and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 7564.996 ymin: 530635.8 xmax: 468754.8 ymax: 1218625
## Projected CRS: OSGB36 / British National Grid
# calculate the ratio of Lev to Uli prescribed 
variation_ECP_prescribed <- ECP_scripts %>% 
  group_by(hb_name,drug) %>% 
  # calculate the total of Lev and Uli prescribed per health board
  summarise(total_each_drug_type = sum(paid_quantity, na.rm = TRUE)) %>%
  drop_na(drug) %>% 
  #pivot_wider to move drug names to columns 
  pivot_wider(names_from = drug, values_from = total_each_drug_type) %>%
  clean_names() %>% # consistency 
  mutate(levo_to_uli_ratio = levonorgestrel / (levonorgestrel + ulipristal_acetate)) # calculate ratio 

# Join spatial data with variation_ECP_prescribed
variation_ECP_prescribed <- NHS_healthboards %>%
  left_join(variation_ECP_prescribed)

#Create map in ggplot
map_variation_ECP_prescribed <- variation_ECP_prescribed %>%
  ggplot(aes(fill = levo_to_uli_ratio)) +
  geom_sf(size = 0.1, colour = "grey50", alpha =0.9) +
  scale_fill_distiller(palette = "Blues", direction = 1) +
  labs(title = "Geographical variation in Emergency Contraceptive \n Pill prescribing in Scotland.", 
       subtitle = "Heatmap showing Levonorgestrel prescriptions as a proportion of total \n Emergency Contraceptive Pill (ECP) prescriptions by Health Board region", 
       fill = "Levonorgestrel to ECP ratio", caption = "Data Source: Public Health Scotland Prescriptions in the Community, 2019-2022") +
  coord_sf() +
  theme_void() +
  theme(
    plot.title = element_text(face = "bold", size = 18, hjust=0.5), 
    plot.subtitle = element_text(size = 14, hjust=0.5), 
    legend.title = element_text(size = 12),
    legend.text = element_text(size = 9, hjust=0.5),
    legend.direction = "vertical",
    legend.box = "horizontal") +
  annotation_scale(location = "tl") +
  annotation_north_arrow(
    location = "tl",    
    pad_y = unit(0.5, "in"),    
    style = north_arrow_nautical(fill = c("grey40", "white"),line_col = "grey20"))

map_variation_ECP_prescribed

Conclusion

prescribing choices are determined by geography, deprivation, age

Recommendations from analysis

Limitations of the dataset and suggestions for future analysis

additional time and data - directed acyclical graphic to explore this further explore the statistical relationships.

References

I used https://gsverhoeven.github.io/post/zotero-rmarkdown-csl/ to set up citations. Chosen BMJ style as common.

---
title: "Assessment: Exploratory analaysis of the factors affecting Emergency Contraceptive prescribing in Scotland"
author: "B273025"
date: "`r Sys.Date()`"
bibliography: "../zotero_citations.json"
link-citations: true
output: 
  html_document: 
    code_download: true
    toc: true
    theme: flatly

---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE, cache=FALSE, fig.align = "center", fig.width = 10, fig.height = 7) # align figures to centre and set dimensions to prevent them being cut off 
```

# Introduction 

Preventing unintended pregnancy is important for both individual wellbeing and public health. Unprotected sex occurs for multiple, complex reasons including sexual assault, reproductive coercion, contraceptive mishaps, and lapses in contraceptive use. Emergency contraception is used to prevent pregnancy following unprotected sex or contraceptive failure. In Scotland emergency contraception can be obtained via a pill or intrauterine device. 

Currently in Scotland there are two types of emergency contraceptive pill (ECP) prescribed by pharmacies. These are:

>* **Levonorgestrel ** (brand name: *Levonelle*) - Effective within 72 hours (three days) of unprotected sex
>* **Ulipristal Acetate ** (brand name: *ellaOne*) - Effective within 120 hours (five days) of unprotected sex  

This analysis explores prescribing patterns of ECP using the Public Health Scotland Community Prescriptions Database from 2019 to 2022. Understanding the prescribing patterns of ECP is essential for identifying disparities in access and improving public health strategies. This report aims to identify factors influencing ECP prescribing in Scotland through answering: 

> 1) What are the trends in prescribing of ECP between 2019 and 2022? 
> 2) How does the ECP prescription rate vary across Health Board regions in Scotland?
> 3) Are pharmacies in more deprived areas prescribing more ECP than those in affluent areas?
> 4) Is there geographical variation in prescribing practices of ECP by Health Board region?


```{r}
# load necessary libraries
library(tidyverse)
library(janitor) 
library(gt) 
library(here)
library(lubridate)
library(sf)
library(ggspatial)
library(plotly)
```

Read in datasets:
```{r}
# Read in the Health Board names (HB_names). See code for link.
HB_names <- read_csv("https://www.opendata.nhs.scot/dataset/9f942fdb-e59e-44f5-b534-d6e17229cc7b/resource/652ff726-e676-4a20-abda-435b98dd7bdc/download/hb14_hb19.csv") %>% 
  clean_names() # ensures column names are unique, lower case and spaces and special characters are replaced with underscores. I will use this function for all read ins to ensure my data is consistent and easy to manipulate.

# Read in population data per Health Board from the 2022 census data. Available from: https://statistics.ukdataservice.ac.uk/dataset/scotland-s-census-2022-uv102a-age-by-sex/resource/b2d295c2-af53-4b3d-a075-7815cadd9060 
all_population_data <- read_csv(here("data", "UV103_age_health_board_census.csv"), skip = 10) %>% # locates the csv and excludes the first 10 lines in the csv as they are redundant
  rename(Spare = "...6", # remove unused columns
         hb_name = "Health Board Area 2019",
         hb_population = Count) %>% # rename() formats the data to match the prescriptions dataframe
  filter(Sex == "Female") %>% # filter female population, as men do not take ECP
  select(hb_name, Age, hb_population) %>% # select columns of interest
  mutate(hb_name = paste("NHS", hb_name)) %>%  # change hb_name column format to match the Health Board dataframe format
  clean_names()

# filter dataset for the total female population for each health board:
population_data <- all_population_data %>% 
  filter(age == "All people") %>% 
  select(hb_name, hb_population)

# select the population aged 16 to 34 years as this is the population the population most likely engaging with risk taking behaviors such as unprotected sex. I chose 16 as this is the age of consent in Scotland, and 34 as my cut off for young adults. 
young_population_data <- all_population_data %>% 
  filter(age%in% c(16:34)) %>% # filter ages of interest
  mutate(age = as.numeric(age)) %>% # make numerical so they can be placed in to buckets
  mutate(age_group = case_when(between(age, 16,34)~'16-34')) %>% # make bucket 16-34 years
  group_by(hb_name, age_group) %>% 
  summarise(pop_hb_16_34 = sum(hb_population)) # sum total population aged 16 to 34 per health board region. 

```

Define a function to read in the prescription data for a defined year:
```{r}
#For efficiency I created a function to read in the prescriptions datasets. I downloaded 12 months of prescription data for each year from 2019 to 2022. I placed the 12 months of data into their relevant folder named all_months_year, where year was specific to the data it contained.
read_all_prescriptions <- function(year){
  all_files <- list.files(here("data", paste0("all_months_", year)), pattern = "csv") #list.files() retrieves files from the relevant directory, and the paste0() dynamically constructs the folder name based on the year variable placed into the function.
  all_prescriptions <- all_files %>%
    map_dfr(~read_csv(here("data", paste0("all_months_", year),.))) %>% #map_dfr() row-binds the datasets being read in
    clean_names() %>%
    drop_na(bnf_item_description) # drop the rows with missing bnf_item values
  return(all_prescriptions)}
```

Read in datasets using read_all_prescriptions() function and start data wrangling:
```{r}
# all_prescriptions_2019 <- read_all_prescriptions(2019)
# all_prescriptions_2020 <- read_all_prescriptions(2020)
# all_prescriptions_2021 <- read_all_prescriptions(2021)
# all_prescriptions_2022 <- read_all_prescriptions(2022)
# 
# # 2019 dataset has hbt2014 as a column name instead of hbt. Rename to make column name consistent
# all_prescriptions_2019 <- all_prescriptions_2019 %>%
#   rename(hbt = "hbt2014")
# 
# # combine 4 years of data into one dataset to make it easier to wrangle
# combined_prescriptions <- bind_rows(
#   all_prescriptions_2019,
#   all_prescriptions_2020,
#   all_prescriptions_2021,
#   all_prescriptions_2022 )%>% # I mutate and select the prescriptions of interest to prevent a very large dataset from being stored in my environment (prevents R slowing down and crashing)
#   mutate(
#     date = parse_date_time(paid_date_month, "ym"), # use lubridate to format date
#     drug_simple = case_when(
#       str_detect(bnf_item_description, "LEVONO") ~ "Levonorgestrel",
#       str_detect(bnf_item_description, "ULIPR") ~ "Ulipristal Acetate",
#       TRUE ~ "Other")) %>% # used case_when() to select ECP even when they have different dosages
#   filter(drug_simple != "Other",!is.na(date)) %>% #remove prescriptions of no interest and any missing date values
#   filter(hbt != "SB0806") %>% # filter out SB0806 as it is not a health board (Scottish Ambulance Service)
#   filter(!is.na(hbt))

# save combined_prescriptions dataset to a csv in my directory, to prevent many large datasets being stored in my environment and slowing down R. 
# write_csv(combined_prescriptions,"data/combined_prescriptions.csv")
combined_prescriptions <- read_csv(here("data","combined_prescriptions.csv"))
```

Join and wrangle data:
```{r}
# join the prescriptions dataset to the Health Board names and health board population datasets
ECP_scripts <- combined_prescriptions %>% 
  full_join(HB_names, by = c("hbt" = "hb")) %>% # join with Health Board names
  full_join(population_data, by = "hb_name") %>% # join with population data
  select(gp_practice, date, drug = drug_simple, hb_name, paid_quantity,hb_population) %>%  # select columns of interest
  mutate(month = factor(month(date), levels = 1:12, labels = month.abb), .after = date) %>%   
  mutate(year = factor(year(date)), .after = month) # extract month and year as a factor with labels to help when plotting
```

# Key Results 

## What are the trends in prescribing of ECP between 2019 and 2022? 

This plot explores if there is monthly variation in prescribing rate of ECP for the years 2019 to 2022. I included 4 years of consecutive data to see if a one-time event, such as the Covid-19 pandemic or the Common Wealth Games caused any changes in emergency contraception prescribing rate.
I wanted to see if there was seasonal trends in ECP prescriptions coinciding with events such as Valentines day, University start dates, or national holidays. 
I used population-weighted prescribing rates to avoid bias.

Figure 1:
```{r}
total_population <- sum(population_data$hb_population, na.rm=TRUE) # calculate total population of Scotland

# calculate the number of prescriptions of each drug per month
ECP_by_month <- ECP_scripts %>%
  group_by(date, drug, month, year) %>%
  summarise(total_quantity_month = sum(paid_quantity, na.rm = TRUE)) %>%
  ungroup() %>% 
  drop_na(drug) %>% 
  clean_names()

# calculate prescription rate per 100,000 people for each drug and month
ECP_trend_plot_data <- ECP_by_month %>%
  group_by(drug, year, month) %>% 
  summarise(prescriptions_per_100000 = (total_quantity_month / total_population)*100000)

ECP_trend_plot <- ECP_trend_plot_data %>%
  ggplot(aes(x = month, y = prescriptions_per_100000, group = year, linetype = year, color = year)) + # plot data for each year on same graph
  geom_line(aes(size = ifelse(year == "2020", 0.9, 0.7))) + # make the 2020 line slightly thicker to make it stand out 
  facet_wrap(~drug, scales = "free_y") + # separate the two types of ECP
  scale_linetype_manual(values = c("2019" = "dotdash", "2020" = "solid", "2021" = "dashed", "2022" = "dotted"),name = "Year") + # make each year identifiable by linte type
  scale_color_manual(values = c("2019" = "grey70", "2020" = "#4575b4", "2021" = "grey70", "2022" = "grey70"),name = "Year") + # make Covid-19 year blue (2020 had the longest lockdown, March till May)
  scale_size_identity() + # ensures ggplot uses the size values as outlined in geom_line()
  scale_y_continuous(limits = c(0, NA), breaks = scales::pretty_breaks(n = 5))+ # ensure y-axis starts at 0 and has evenly spaced breaks 
  labs(
    title = "Trends in Emergency Contraception Prescribing 2019–2022 \n in Scotland.",
    subtitle = "Four years of population adjusted prescribing rates\n  highlighting the impact of Covid-19 (blue)",
    x = "Month",
    y = "Prescriptions per 100,000 women") +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 18, hjust = 0.5, face = "bold"),
    plot.subtitle = element_text(size = 14, hjust = 0.5),
    axis.title.x = element_text(size = 12),
    axis.title.y = element_text(size = 12),
    axis.text.x = element_text(angle = 45, hjust = 1, size = 9),
    axis.text.y = element_text(size = 9),
    strip.text = element_text(size = 12),
    legend.title = element_text(size = 12),
    legend.text = element_text(size = 12),
    legend.position = "bottom",
    plot.margin = margin(10, 30, 10, 30)) # increase size of margins

ECP_trend_plot
```

Due to low prescribing rates of ECP there is stochastic variation on the graphical representation of the data. Interesting trends to note include:

* Levonorgestrel is more commonly prescribed than Ulipristal Acetate, with levonorgestrel having a mean prescription rate of 150 presciptions of Levonorgestrel per 100,000 women each month compared to Ulipristal Acetate having 3 prescriptions per 100,000 women per month. 

* Prescribing rate of Levonorgestrel has decreased each year between 2019 to 2022. 

* There is a large decrease in the prescription rate of Ulipristal Acetate in 2019 and all consecutive years. This likely reflects the change in prescription guidance in 2020, where the European Medicines Agency recommended withdrawing Ulipristal Acetate as a treatment for uterine fibroids due to safety concerns. Scotland adopted this guidance, and the sharp decrease is due to pharmacies no longer dispensing Ulipristal acetate as a treatment for uterine fibroids. The only prescriptions of Ulipristal Acetate from 2020 onwards is for ECP.  

* There are peaks in the Summer months suggesting there may be a 'Summer of Love'.

* Between February and May of 2020 the prescription rate of Levonorgestrel decreases. This is likely due to decreased sexual activity and risk taking behaviours due to the stringent national 'lockdown' measures introduced between X to Y in response to the Covid-19 pandemic. 

## How does the ECP prescription rate vary across Health Board regions in Scotland? 

This figure explores if there are differences in the average annual prescription rate for each Health Board in Scotland. To explore possible reasons for differences I calculated the proportion of young people in each Health Board to see if there was a trend. 

Wrangle data to get the columns and values of interest: 
```{r}
ECP_anual_rate_data <- ECP_scripts %>% 
  group_by(hb_name, drug) %>% # aggregate data by Health Board and drug
  summarise(total_quantity_4_years = sum(paid_quantity, na.rm = TRUE), # total prescriptions over 4 years
    avg_annual_total_quantity = total_quantity_4_years / 4, # average annual prescriptions
    hb_population = first(hb_population), # hb_population is consistent within each hb_name
    .groups = "drop") %>% # ungroup data
  drop_na(drug) %>% # remove rows with no drug values
  mutate(avg_annual_presc_100000 = avg_annual_total_quantity * 100000 / hb_population) %>% # average annual rate of prescriptions per 100,000
  select(hb_name, drug, avg_annual_presc_100000, hb_population) %>% # select columns of interest before pivot
  pivot_wider(names_from = drug, values_from = avg_annual_presc_100000, names_glue = "{drug}_rate") %>% # Rename columns for clarity
  clean_names() # clean column names following pivot

# calculate total ECP prescription rate per health board
ECP_anual_rate_data <- ECP_anual_rate_data %>% 
  mutate(total_ECP_rate = rowSums(select(., levonorgestrel_rate, ulipristal_acetate_rate), na.rm = TRUE)) %>%  #sum rates 
  arrange(desc(total_ECP_rate)) # arrange by total rate in descending order

#calculate percentage young people per health board 
ECP_anual_rate_data_young <- ECP_anual_rate_data %>% 
  full_join(young_population_data) %>% # to get the population of those aged 16-34 years 
  group_by(hb_name) %>% 
  mutate(prop_young_ppl_hb = pop_hb_16_34/hb_population) %>% # calculate proportion of young people in each health board 
  ungroup()
```

Table 1:
```{r}
annual_avg_ECP_table <- ECP_anual_rate_data_young %>% 
  select(hb_name, levonorgestrel_rate, ulipristal_acetate_rate, total_ECP_rate,prop_young_ppl_hb) %>% # Select relevant columns
  gt() %>% 
  cols_label(hb_name = "Health Board",
             total_ECP_rate= "Total",
             levonorgestrel_rate= " Levonorgestrel",
             ulipristal_acetate_rate=" Ulipristal Acetate",
             prop_young_ppl_hb = "% Aged 16-34 Years") %>% # rename columns to make reader-friendly
  fmt_number(columns = c(levonorgestrel_rate, ulipristal_acetate_rate, total_ECP_rate, prop_young_ppl_hb), decimals = 0) %>% # no decimal points as false accuracy detracts from the message in the data
  cols_align(align = "center", columns = c(levonorgestrel_rate,ulipristal_acetate_rate,total_ECP_rate, prop_young_ppl_hb)) %>%  # centre column names
  grand_summary_rows(columns = c(levonorgestrel_rate,ulipristal_acetate_rate, total_ECP_rate), fns = list("Overall Average" = ~mean(., na.rm = TRUE)), fmt = list(~ fmt_number(., decimals = 0))) %>% # add an overall average for the rate columns
  fmt_percent(columns = prop_young_ppl_hb, decimals = 0) %>%  # add percentage sign
  tab_header(title = md("**Average Annual Rate of Emergency Contraception Prescriptions by Health Board in Scotland.**"),
             subtitle = md("Rate per 100,000 women, derived from the mean prescription rates across the years 2019 to 2022. Health Boards are ranked in descending order.")) %>% # add a title and subtitle; md() allows text formatting from mark down 
  tab_spanner(label = md("*Prescription rate per 100,000 women*"), columns = c(levonorgestrel_rate,ulipristal_acetate_rate, total_ECP_rate)) %>% # add a title to the rate columns.
  tab_source_note(md("*Data from Public Health Scotland. Available from: (https://www.opendata.nhs.scot/dataset/prescriptions-in-the-community)*")) %>% 
  tab_stubhead(md("**2019-2022**")) %>% 
  tab_footnote(footnote = "includes Capital City, Edinburgh", locations = cells_body(columns = hb_name, rows = 2))%>% 
  opt_stylize(style = 6, color = "blue")

annual_avg_ECP_table
```

Generally the health boards with the highest annual ECP prescription rate also had the highest proportion of young people living in their health board region. This suggests that young people are more likely using the ECP pharmacy service. Notably the top two Health Boards for prescribing ECP contain major Scottish cities, Glasgow and Edinburgh.

NHS Ayrshire and Arran is an outlier to this trend having the third highest ECP prescription rate, and yet a relatively smaller proportion of young people living in this health board (19%). This needs further exploration, perhaps indicating that under 16s or over 35s are using the ECP services more in this region. 

More remote Health Boards such as NHS Highland and NHS borders had the lowest rate of ECP prescribing. They also had a smaller proportion of young people living in their health board. 

When comparing the types of ECP prescribed generally levonorgestrel prescribed significantly more than Ulipristal Acetate. However this is not the case in the NHS Western Isles Health Board, where a similar proportion of Levonorgestrel is prescribed to Ulipristal Acetate. This trend is explored more in figure 4 to see whether there is geographical variation in the type of ECP prescribed.

## Are pharmacies in more deprived areas prescribing more ECP than those in affluent areas?

This figure explores if 

To measure deprivation I have used the Scottish Index of Multiple Deprivation. I took the SIMD 2020v2 dataset from Public Health Scotland website [available from: https://www.opendata.nhs.scot/gl/dataset/scottish-index-of-multiple-deprivation/resource/acade396-8430-4b34-895a-b3e757fa346e ] as this dataset contains SIMD decile weighted per Health Board population.

When interpreting SIMD in deciles rank 1 is considered the most deprived, and rank 10 is least deprived. 

```{r}
# read in datasets: (link available in code)
SIMD <- read_csv("https://www.opendata.nhs.scot/gl/dataset/78d41fa9-1a62-4f7b-9edb-3e8522a93378/resource/acade396-8430-4b34-895a-b3e757fa346e/download/simd2020v2_22062020.csv") %>%
  clean_names() %>%
  select(data_zone, simd2020v2hb_decile)

# I chose to use GP Practices and List Sizes from October 2022, as this was the closest dataset I could find which correlated with the final year of my prescriptions dataset:
gp_addresses <- read_csv("https://www.opendata.nhs.scot/dataset/f23655c3-6e23-4103-a511-a80d998adb90/resource/1a15cb34-fcf9-4d3f-ad63-1ba3e675fbe2/download/practice_contactdetails_oct2022-open-data.csv") %>%
  clean_names() %>%
  select(practice_code, gp_practice_name, data_zone)

# Create ECP_GP by using the GP_addresses dataset to map the GP practice code to a datazone. I then used the column datazone to full_join() the SIMD dataset to the prescriptions dataset.
ECP_GP <- ECP_scripts %>%
  filter(!gp_practice %in% c(99996, 99997, 99998)) %>% # remove dummy GP practice codes as they do not have a known gp practice code so cannot be mapped to a SIMD.
  left_join(gp_addresses, by = c("gp_practice" = "practice_code")) %>%
  #left_join(data_zones, by = "data_zone") %>%
  left_join(SIMD, by = "data_zone") %>%
  drop_na(simd2020v2hb_decile) %>% # need SIMD value to for plot
  group_by(gp_practice) %>% 
  mutate(total_quantity_per_gp = sum(paid_quantity)) %>% 
  clean_names()

# calculate the number of GPs per SIMD (account for each SIMD having a different number of GPs) 
toal_no_GP_per_SIMD <- ECP_GP %>% 
  group_by(simd2020v2hb_decile) %>%
  mutate(unique_gp_count_per_SIMD = n_distinct(gp_practice))
```

Figure 2:
```{r}
ECP_SIMD_barchart <- toal_no_GP_per_SIMD %>%
  group_by(simd2020v2hb_decile, drug) %>% 
  summarise(prescriptions_gp = (total_quantity_per_gp / unique_gp_count_per_SIMD), # calculate the prescriptions per GP for each SIMD decile and drug
    .groups = "drop") %>%  # Ungroup data after summarisation
  ggplot(aes(x = factor(simd2020v2hb_decile, levels = 1:10),y = prescriptions_gp,  fill = drug, text = paste("Decile:", simd2020v2hb_decile, "<br>Drug:", drug))) + # customise hover box for interactive chart
  geom_col() +
  scale_fill_manual(values = c("#4575b4", "#91bfdb"), name = "Drug Type") +  # add colour palette
  labs(title = "Emergency Contraception Prescriptions by \n Scottish Index of Multiple Deprivation (SIMD) 2019 to 2022, \n normalised by number of GPs per SIMD decile.",
    x = "SIMD Decile \n (1 = Most Deprived, 10 = Least Deprived)",
    y = "Prescriptions",
    fill = "Drug") +
  theme_minimal(base_size = 12) +
  theme(plot.title = element_text(size = 14, face = "bold", hjust = 0.5),
    plot.subtitle = element_text(size = 12, hjust = 0.5),
    axis.title.x = element_text(size = 12),
    axis.title.y = element_text(size = 12),
    axis.text.x = element_text(size = 10, hjust = 1),
    axis.text.y = element_text(size = 10),
    legend.position = "right",
    legend.title = element_text(size = 12, face = "bold"),
    legend.text = element_text(size = 10))

ECP_SIMD_barchart <- ggplotly(ECP_SIMD_barchart, tooltip = "text") # make plot interactive
ECP_SIMD_barchart
```

## Is there geographical variation in prescribing practices of ECP by Health Board region?

I wanted to explore if there was any correlation between the type of contraception being prescribed and deprivation. To do this I made a ratio of levonorgestrel  to total emergency contraception prescriptions. 

$$
\frac{Levonorgestrel}{Levonorgestrel + Ulipristal Acetate}
$$


This is important to identify any variation in prescribing patterns in more deprived areas. Differences may suggest patients access health services later, so have to use Ulipristal Acetate, or that GPs or Pharmacists have differences in prescribing preferences in more deprived areas.

Figure 3:
```{r}
# load the NHS Health board Shapefile downloaded from learn page
NHS_healthboards <- st_read(here("data", "NHS_healthboards_2019.shp")) %>% 
  mutate(HBName = paste("NHS", HBName)) %>% # format to match ECP_scripts dataset
  clean_names()

# calculate the ratio of Lev to Uli prescribed 
variation_ECP_prescribed <- ECP_scripts %>% 
  group_by(hb_name,drug) %>% 
  # calculate the total of Lev and Uli prescribed per health board
  summarise(total_each_drug_type = sum(paid_quantity, na.rm = TRUE)) %>%
  drop_na(drug) %>% 
  #pivot_wider to move drug names to columns 
  pivot_wider(names_from = drug, values_from = total_each_drug_type) %>%
  clean_names() %>% # consistency 
  mutate(levo_to_uli_ratio = levonorgestrel / (levonorgestrel + ulipristal_acetate)) # calculate ratio 

# Join spatial data with variation_ECP_prescribed
variation_ECP_prescribed <- NHS_healthboards %>%
  left_join(variation_ECP_prescribed)

#Create map in ggplot
map_variation_ECP_prescribed <- variation_ECP_prescribed %>%
  ggplot(aes(fill = levo_to_uli_ratio)) +
  geom_sf(size = 0.1, colour = "grey50", alpha =0.9) +
  scale_fill_distiller(palette = "Blues", direction = 1) +
  labs(title = "Geographical variation in Emergency Contraceptive \n Pill prescribing in Scotland.", 
       subtitle = "Heatmap showing Levonorgestrel prescriptions as a proportion of total \n Emergency Contraceptive Pill (ECP) prescriptions by Health Board region", 
       fill = "Levonorgestrel to ECP ratio", caption = "Data Source: Public Health Scotland Prescriptions in the Community, 2019-2022") +
  coord_sf() +
  theme_void() +
  theme(
    plot.title = element_text(face = "bold", size = 18, hjust=0.5), 
    plot.subtitle = element_text(size = 14, hjust=0.5), 
    legend.title = element_text(size = 12),
    legend.text = element_text(size = 9, hjust=0.5),
    legend.direction = "vertical",
    legend.box = "horizontal") +
  annotation_scale(location = "tl") +
  annotation_north_arrow(
    location = "tl",    
    pad_y = unit(0.5, "in"),    
    style = north_arrow_nautical(fill = c("grey40", "white"),line_col = "grey20"))

map_variation_ECP_prescribed
```

# Conclusion 

prescribing choices are determined by geography, deprivation, age 

## Recommendations from analysis 

## Limitations of the dataset and suggestions for future analysis
additional time and data - directed acyclical graphic to explore this further
explore the statistical relationships. 

# References 
I used https://gsverhoeven.github.io/post/zotero-rmarkdown-csl/ to set up citations. Chosen BMJ style as common. 

